Background
The initial interest in this project stemmed from an Atlantic article about the sexism of Wikipedia. The statistics were staggering. Less then 10 percent of Wikipedia editors are women and few of these editors are experienced (having made at least 500 edits).The lack of women editors leads to a lack of pages for notable women. For example,there are at least 4,400 female scientists who reach Wikipedia's standards of notability, but have not had pages created for them. Even when pages of notable women are made, studies show that such pages are more likely to mention their gender and relationships in comparison to the pages of notable men. We wanted to further explore the issue of differences between the Wikipedia pages of famous men and women in this project, by looking closely at the trends surrounding the Wikipedia pages of the 100 most famous men and womenIntroduction
The main question that we set out to answer in this project was the following: How do the Wikipedia pages of famous women differ from those of famous men?
To answer this question, we first generated a list of the to 5 'most famous' women and men using Google's PageRank We chose to use PageRank as a metric by which to define fame based on the desire to use a source extrinsic to wikipedia to avoid biases that wikipedia metrics may have(although such biases in Google's algorithm are of course also possible). The top 10 most famous men and women based on this metric are listed below:
| Rank | Women | Men |
|---|---|---|
| 1 | Elizabeth II | Napoleon |
| 2 | Queen Victoria | Barack Obama |
| 3 | Mary(mother of Jesus) | George W. Bush |
| 4 | Elizabeth I | William Shakespeare |
| 5 | Margaret Thatcher | Jesus |
| 6 | Madonna(entertainer) | Adolf Hitler |
| 7 | Hillary Clinton | Franklin D. Roosevelt |
| 8 | Catherine the Great | Aristotle |
| 9 | Beyonce | Bill Clinton |
| 10 | Britney Spears | Ronald Reagan |
We were interested not in the specific content of the pages, but rather, features of these pages, and user interaction with these pages. With this in mind, we chose to investigate this question by looking at the following attributes:
- The number of backlinks(links to the Wikipedia pages from outside pages)
- The number of revisions per page
- The size of revisions to these pages
- The number of unique editors per page
- The amount of text per page
- The language used on these pages(main pages, as well as talk pages)
The top 5 most famous men and women on Wikipedia
We first decided to look at the Wikipedia pages of only the top 5 men and women.
One attribute we looked at was backlinks(the number of links to a given Wikipedia page from other Wikipedia pages). We intended to use this feature as one metric of the page's popularity, and connectedness to other pages. Results from the top 5 pages(for men and women) show that in general(with the exception of the most popular man and woman), the pages of famous men have more backlinks.
We next looked at the number of revisions per page(over the lifetime of the page). The results show a similar trend to that of the backlinks: that the number of revisions to the Wikipedia page of a famous man is greater than the number of revisions to the Wikipedia page of a famous woman of equal 'fame'(PageRank)
We looked at the number of unique per page(over the lifetime of the page).We wanted to see if the observed discrepancy in the number of revisions could be explained by a small number of editors making multiple edits. However, when we plotted the number of unique editors, we saw the same trend as we observed: a greater number of unique editors for pages of men with a given rank compared to the pages of women with equal rank. Thus, it seems like there are simply more edits and editors for these mens' pages Finally, we looked at the amount of text(number of words) per page. The results generally show that length of text for the Wikipedia page of a famous man is greater than the number of revisions to the Wikipedia page of a famous woman of equal 'fame'(PageRank)Analysis for the top 100 most famous men and women
Intrigued by our results from the top 5 most famous men and women, we decided to look at the same features for a larger population(100 men and women) to see if the trends that we observed generalize. One reason that we wanted to repeat this analysis for a greater number of people was that we noticed that the top 5 most 'famous' men and women(according to PageRank), represented rather specific categories, specifically British royalty and politicians(Elizabeth II, Queen Victoria, Elizabeth I, Margaret Thatcher), and religious figures(Jesus, Mary), which were not representative of the full top 100 list. To determine if the results that we observed were in fact a result of differences between the pages of famous men and women on Wikipedia, or if this was biased by other specific attributes of these 10 people, we expanded our analysis to 100 men and 100 women, which represented a more diverse group. You can see the top 100 men and women broken down into broad categories below:
The distribution of these categories among men and women is also interesting, and revealing about the types of professions that make men and women famous. For example, while the most common categories for famous men were Political(Historical)(pre-1900) (31%) and Political(Current) (post-1900) (28%), the most common category for famous women was Artists and Celebrities(Current) (post-1900) (51%).Number of Backlinks
The mean number of backlinks to the Wikipedia pages of famous men is significantly greater than the number of backlinks to the Wikipedia pages of famous women(two-tailed t-test: t=-8.27, p<0.001)